AITopics

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Texas (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre:

Overview (1.00)
Workflow (0.93)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Neural Information Processing SystemsFeb-18-2026, 17:12:29 GMT

f4a04396c2ed1342a5d8d05e94cb6101-Paper-Datasets_and_Benchmarks_Track.pdf

data mining, large language model, machine learning, (21 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Texas (0.04)
(5 more...)

Genre:

Workflow (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.94)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Neural Information Processing SystemsFeb-18-2026, 00:20:19 GMT

A Author Statement 668 We bear all responsibilities for the content, licensing, distribution, and maintenance of our datasets 669 in S

Q6 How many instances are there in total (of each type, if appropriate)?

artificial intelligence, machine learning, natural language, (19 more...)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsFeb-12-2026, 11:07:56 GMT

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu

Neural Information Processing Systems http://nips.cc/

feature map, video, video content, (16 more...)

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)

Neural Information Processing SystemsDec-24-2025, 20:53:09 GMT

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events.Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models. To facilitate this evaluation, we develop an automated pipeline for generating temporal reasoning question-answer pairs, significantly reducing the need for labor-intensive manual annotations. Our benchmark includes 921 carefully vetted validation samples and 2,143 test samples, each manually curated for accuracy and relevance. Evaluation results show that while frontier large language models outperform academic models, they still lag behind human performance by a significant 14.3\% accuracy gap. Additionally, our pipeline creates a training dataset of 9,695 machine generated samples without manual effort, which empirical studies suggest can enhance the across-time reasoning via fine-tuning.

artificial intelligence, large language model, natural language, (8 more...)

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)

arXiv.org Artificial IntelligenceNov-18-2025

GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory

Yeo, Jeong Hun, Chung, Sangyun, Park, Sungjune, Kim, Dae Hoe, Moon, Jinyoung, Ro, Yong Man

Long-video understanding remains a significant challenge for Multimodal Large Language Models (MLLMs) due to inherent token limitations and the complexity of capturing long-term temporal dependencies. Existing methods often fail to capture the global context and complex event relationships necessary for deep video reasoning. To address this, we introduce GCAgent, a novel Global-Context-Aware Agent framework that achieves comprehensive long-video understanding. Our core innovation is the Schematic and Narrative Episodic Memory. This memory structurally models events and their causal and temporal relations into a concise, organized context, fundamentally resolving the long-term dependency problem. Operating in a multi-stage Perception-Action-Reflection cycle, our GCAgent utilizes a Memory Manager to retrieve relevant episodic context for robust, context-aware inference. Extensive experiments confirm that GCAgent significantly enhances long-video understanding, achieving up to 23.5\% accuracy improvement on the Video-MME Long split over a strong MLLM baseline. Furthermore, our framework establishes state-of-the-art performance among comparable 7B-scale MLLMs, achieving 73.4\% accuracy on the Long split and the highest overall average (71.9\%) on the Video-MME benchmark, validating our agent-based reasoning paradigm and structured memory for cognitively-inspired long-video understanding.

large language model, machine learning, natural language, (21 more...)

2511.12027

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Consumer Health (0.66)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(2 more...)

arXiv.org Artificial IntelligenceOct-31-2025

DARTS: A Drone-Based AI-Powered Real-Time Traffic Incident Detection System

Li, Bai, Kourtellis, Achilleas, Cao, Rong, Post, Joseph, Porter, Brian, Zhang, Yu

Rapid and reliable incident detection is critical for reducing crash-related fatalities, injuries, and congestion. However, conventional methods, such as closed-circuit television, dashcam footage, and sensor-based detection, separate detection from verification, suffer from limited flexibility, and require dense infrastructure or high penetration rates, restricting adaptability and scalability to shifting incident hotspots. To overcome these challenges, we developed DARTS, a drone-based, AI-powered real-time traffic incident detection system. DARTS integrates drones' high mobility and aerial perspective for adaptive surveillance, thermal imaging for better low-visibility performance and privacy protection, and a lightweight deep learning framework for real-time vehicle trajectory extraction and incident detection. The system achieved 99% detection accuracy on a self-collected dataset and supports simultaneous online visual verification, severity assessment, and incident-induced congestion propagation monitoring via a web-based interface. In a field test on Interstate 75 in Florida, DARTS detected and verified a rear-end collision 12 minutes earlier than the local transportation management center and monitored incident-induced congestion propagation, suggesting potential to support faster emergency response and enable proactive traffic control to reduce congestion and secondary crash risk. Crucially, DARTS's flexible deployment architecture reduces dependence on frequent physical patrols, indicating potential scalability and cost-effectiveness for use in remote areas and resource-constrained settings. This study presents a promising step toward a more flexible and integrated real-time traffic incident detection system, with significant implications for the operational efficiency and responsiveness of modern transportation management.

data mining, machine learning, real time system, (17 more...)

2510.26004

Country: North America > United States > Florida > Hillsborough County > Tampa (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Health & Medicine (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Goulart, Paimon, Steinhauser, Jordan, Shuler, Kylene, Korzus, Edward, Chen, Jia, Papalexakis, Evangelos E.

Preliminary Use of Vision Language Model Driven Extraction of Mouse Behavior Towards Understanding Fear Expression

arXiv.org Artificial IntelligenceOct-23-2025

Integration of diverse data will be a pivotal step towards improving scientific explorations in many disciplines. This work establishes a vision-language model (VLM) that encodes videos with text input in order to classify various behaviors of a mouse existing in and engaging with their environment. Importantly, this model produces a behavioral vector over time for each subject and for each session the subject undergoes. The output is a valuable dataset that few programs are able to produce with as high accuracy and with minimal user input. Specifically, we use the open-source Qwen2.5-VL model and enhance its performance through prompts, in-context learning (ICL) with labeled examples, and frame-level preprocessing. We found that each of these methods contributes to improved classification, and that combining them results in strong F1 scores across all behaviors, including rare classes like freezing and fleeing, without any model fine-tuning. Overall, this model will support interdisciplinary researchers studying mouse behavior by enabling them to integrate diverse behavioral features, measured across multiple time points and environments, into a comprehensive dataset that can address complex research questions.

artificial intelligence, large language model, natural language, (16 more...)

2510.1916

Country: North America > United States > California (0.17)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Lin, Tingyu, Peer, Marco, Kleber, Florian, Sablatnig, Robert

ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents

arXiv.org Artificial IntelligenceOct-20-2025

This paper presents ClapperText, a benchmark dataset for handwritten and printed text recognition in visually degraded and low-resource settings. The dataset is derived from 127 World War II-era archival video segments containing clapperboards that record structured production metadata such as date, location, and camera-operator identity. ClapperText includes 9,813 annotated frames and 94,573 word-level text instances, 67% of which are handwritten and 1,566 are partially occluded. Each instance includes transcription, semantic category, text type, and occlusion status, with annotations available as rotated bounding boxes represented as 4-point polygons to support spatially precise OCR applications. Recognizing clapperboard text poses significant challenges, including motion blur, handwriting variation, exposure fluctuations, and cluttered backgrounds, mirroring broader challenges in historical document analysis where structured content appears in degraded, non-standard forms. We provide both full-frame annotations and cropped word images to support downstream tasks. Using a consistent per-video evaluation protocol, we benchmark six representative recognition and seven detection models under zero-shot and fine-tuned conditions. Despite the small training set (18 videos), fine-tuning leads to substantial performance gains, highlighting ClapperText's suitability for few-shot learning scenarios. The dataset offers a realistic and culturally grounded resource for advancing robust OCR and document understanding in low-resource archival contexts. The dataset and evaluation code are available at https://github.com/linty5/ClapperText.

machine learning, pattern recognition, recognition, (18 more...)

2510.15557

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)